27 research outputs found

    Harnessing large language models (LLMs) for candidate gene prioritization and selection.

    Get PDF
    BACKGROUND: Feature selection is a critical step for translating advances afforded by systems-scale molecular profiling into actionable clinical insights. While data-driven methods are commonly utilized for selecting candidate genes, knowledge-driven methods must contend with the challenge of efficiently sifting through extensive volumes of biomedical information. This work aimed to assess the utility of large language models (LLMs) for knowledge-driven gene prioritization and selection. METHODS: In this proof of concept, we focused on 11 blood transcriptional modules associated with an Erythroid cells signature. We evaluated four leading LLMs across multiple tasks. Next, we established a workflow leveraging LLMs. The steps consisted of: (1) Selecting one of the 11 modules; (2) Identifying functional convergences among constituent genes using the LLMs; (3) Scoring candidate genes across six criteria capturing the gene\u27s biological and clinical relevance; (4) Prioritizing candidate genes and summarizing justifications; (5) Fact-checking justifications and identifying supporting references; (6) Selecting a top candidate gene based on validated scoring justifications; and (7) Factoring in transcriptome profiling data to finalize the selection of the top candidate gene. RESULTS: Of the four LLMs evaluated, OpenAI\u27s GPT-4 and Anthropic\u27s Claude demonstrated the best performance and were chosen for the implementation of the candidate gene prioritization and selection workflow. This workflow was run in parallel for each of the 11 erythroid cell modules by participants in a data mining workshop. Module M9.2 served as an illustrative use case. The 30 candidate genes forming this module were assessed, and the top five scoring genes were identified as BCL2L1, ALAS2, SLC4A1, CA1, and FECH. Researchers carefully fact-checked the summarized scoring justifications, after which the LLMs were prompted to select a top candidate based on this information. GPT-4 initially chose BCL2L1, while Claude selected ALAS2. When transcriptional profiling data from three reference datasets were provided for additional context, GPT-4 revised its initial choice to ALAS2, whereas Claude reaffirmed its original selection for this module. CONCLUSIONS: Taken together, our findings highlight the ability of LLMs to prioritize candidate genes with minimal human intervention. This suggests the potential of this technology to boost productivity, especially for tasks that require leveraging extensive biomedical knowledge

    Organizing gene literature retrieval, profiling, and visualization training workshops for early career researchers

    Get PDF
    Developing the skills needed to effectively search and extract information from biomedical literature is essential for early-career researchers. It is, for instance, on this basis that the novelty of experimental results, and therefore publishing opportunities, can be evaluated. Given the unprecedented volume of publications in the field of biomedical research, new systematic approaches need to be devised and adopted for the retrieval and curation of literature relevant to a specific theme. Here we describe a hands-on training curriculum aimed at retrieval, profiling, and visualization of literature associated with a given topic. This curriculum was implemented in a workshop in January 2021. We provide supporting material and step-by-step implementation guidelines with the ISG15 gene literature serving as an illustrative use case. Through participation in such a workshop, trainees can learn: 1) to build and troubleshoot PubMed queries in order to retrieve the literature associated with a gene of interest; 2) to identify key concepts relevant to given themes (such as cell types, diseases, and biological processes); 3) to measure the prevalence of these concepts in the gene literature; 4) to extract key information from relevant articles, and 5) to develop a background section or summary on the basis of this information. Finally, trainees can learn to consolidate the structured information captured through this process for presentation via an interactive web application

    Immunomodulatory effects of vitamin d supplementation in a deficient population

    Get PDF
    In addition to its canonical functions, vitamin D has been proposed to be an important mediator of the immune system. Despite ample sunshine, vitamin D deficiency is prevalent (>80%) in the Middle East, resulting in a high rate of supplementation. However, the underlying molecular mechanisms of the specific regimen prescribed and the potential factors affecting an individual’s response to vitamin D supplementation are not well characterized. Our objective is to describe the changes in the blood transcriptome and explore the potential mechanisms associated with vitamin D3 supplementation in one hundred vitamin D-deficient women who were given a weekly oral dose (50,000 IU) of vitamin D3 for three months. A high-throughput targeted PCR, composed of 264 genes representing the important blood transcriptomic fingerprints of health and disease states, was performed on pre and post-supplementation blood samples to profile the molecular response to vitamin D3. We identified 54 differentially expressed genes that were strongly modulated by vitamin D3 supplementation. Network analyses showed significant changes in the immune-related pathways such as TLR4/CD14 and IFN receptors, and catabolic processes related to NF-kB, which were subsequently confirmed by gene ontology enrichment analyses. We proposed a model for vitamin D3 response based on the expression changes of molecules involved in the receptor-mediated intra-cellular signaling pathways and the ensuing predicted effects on cytokine production. Overall, vitamin D3 has a strong effect on the immune system, G-coupled protein receptor signaling, and the ubiquitin system. We highlighted the major molecular changes and biological processes induced by vitamin D3, which will help to further investigate the effectiveness of vitamin D3 supplementation among individuals in the Middle East as well as other regions.Funding: This work was supported by National Capacity Building Program grant from Qatar University (ID# QUCP-CHS-17\18-1)

    A modular framework for the development of targeted Covid-19 blood transcript profiling panels

    Get PDF
    Covid-19 morbidity and mortality are associated with a dysregulated immune response. Tools are needed to enhance existing immune profiling capabilities in affected patients. Here we aimed to develop an approach to support the design of targeted blood transcriptome panels for profiling the immune response to SARS-CoV-2 infection.; We designed a pool of candidates based on a pre-existing and well-characterized repertoire of blood transcriptional modules. Available Covid-19 blood transcriptome data was also used to guide this process. Further selection steps relied on expert curation. Additionally, we developed several custom web applications to support the evaluation of candidates.; As a proof of principle, we designed three targeted blood transcript panels, each with a different translational connotation: immunological relevance, therapeutic development relevance and SARS biology relevance.; Altogether the work presented here may contribute to the future expansion of immune profiling capabilities via targeted profiling of blood transcript abundance in Covid-19 patients

    Application of a gene modular approach for clinical phenotype genotype association and sepsis prediction using machine learning in meningococcal sepsis

    Get PDF
    Sepsis is a major global health concern causing high morbidity and mortality rates. Our study utilized a Meningococcal Septic Shock (MSS) temporal dataset to investigate the correlation between gene expression (GE) changes and clinical features. The research used Weighted Gene Co-expression Network Analysis (WGCNA) to establish links between gene expression and clinical parameters in infants admitted to the Pediatric Critical Care Unit with MSS. Additionally, various machine learning (ML) algorithms, including Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbors (KNN), Decision Tree, Random Forest, and Artificial Neural Network (ANN) were implemented to predict sepsis survival. The findings revealed a transition in gene function pathways from nuclear to cytoplasmic to extracellular, corresponding with Pediatric Logistic Organ Dysfunction score (PELOD) readings at 0, 24, and 48 h. ANN was the most accurate of the six ML models applied for survival prediction. This study successfully correlated PELOD with transcriptomic data, mapping enriched GE modules in acute sepsis. By integrating network analysis methods to identify key gene modules and using machine learning for sepsis prognosis, this study offers valuable insights for precision-based treatment strategies in future research. The observed temporal-spatial pattern of cellular recovery in sepsis could prove useful in guiding clinical management and therapeutic interventions

    Development of a fixed module repertoire for the analysis and interpretation of blood transcriptome data.

    Get PDF
    As the capacity for generating large-scale molecular profiling data continues to grow, the ability to extract meaningful biological knowledge from it remains a limitation. Here, we describe the development of a new fixed repertoire of transcriptional modules, BloodGen3, that is designed to serve as a stable reusable framework for the analysis and interpretation of blood transcriptome data. The construction of this repertoire is based on co-clustering patterns observed across sixteen immunological and physiological states encompassing 985 blood transcriptome profiles. Interpretation is supported by customized resources, including module-level analysis workflows, fingerprint grid plot visualizations, interactive web applications and an extensive annotation framework comprising functional profiling reports and reference transcriptional profiles. Taken together, this well-characterized and well-supported transcriptional module repertoire can be employed for the interpretation and benchmarking of blood transcriptome profiles within and across patient cohorts. Blood transcriptome fingerprints for the 16 reference cohorts can be accessed interactively via: https://drinchai.shinyapps.io/BloodGen3Module/

    Abundance of ACVR1B transcript is elevated during septic conditions: Perspectives obtained from a hands-on reductionist investigation

    Get PDF
    Sepsis is a complex heterogeneous condition, and the current lack of effective risk and outcome predictors hinders the improvement of its management. Using a reductionist approach leveraging publicly available transcriptomic data, we describe a knowledge gap for the role of ACVR1B (activin A receptor type 1B) in sepsis. ACVR1B, a member of the transforming growth factor-beta (TGF-beta) superfamily, was selected based on the following: 1) induction upon in vitro exposure of neutrophils from healthy subjects with the serum of septic patients (GSE49755), and 2) absence or minimal overlap between ACVR1B, sepsis, inflammation, or neutrophil in published literature. Moreover, ACVR1B expression is upregulated in septic melioidosis, a widespread cause of fatal sepsis in the tropics. Key biological concepts extracted from a series of PubMed queries established indirect links between ACVR1B and “cancer”, “TGF-beta superfamily”, “cell proliferation”, “inhibitors of activin”, and “apoptosis”. We confirmed our observations by measuring ACVR1B transcript abundance in buffy coat samples obtained from healthy individuals (n=3) exposed to septic plasma (n = 26 melioidosis sepsis cases)ex vivo. Based on our re-investigation of publicly available transcriptomic data and newly generated ex vivo data, we provide perspective on the role of ACVR1B during sepsis. Additional experiments for addressing this knowledge gap are discussed

    A Transcriptomic Appreciation of Childhood Meningococcal and Polymicrobial Sepsis from a Pro-Inflammatory and Trajectorial Perspective, a Role for Vascular Endothelial Growth Factor A and B Modulation?

    Get PDF
    This study investigated the temporal dynamics of childhood sepsis by analyzing gene expression changes associated with proinflammatory processes. Five datasets, including four meningococcal sepsis shock (MSS) datasets (two temporal and two longitudinal) and one polymicrobial sepsis dataset, were selected to track temporal changes in gene expression. Hierarchical clustering revealed three temporal phases: early, intermediate, and late, providing a framework for understanding sepsis progression. Principal component analysis supported the identification of gene expression trajectories. Differential gene analysis highlighted consistent upregulation of vascular endothelial growth factor A (VEGF-A) and nuclear factor κB1 (NFKB1), genes involved in inflammation, across the sepsis datasets. NFKB1 gene expression also showed temporal changes in the MSS datasets. In the postmortem dataset comparing MSS cases to controls, VEGF-A was upregulated and VEGF-B downregulated. Renal tissue exhibited higher VEGF-A expression compared with other tissues. Similar VEGF-A upregulation and VEGF-B downregulation patterns were observed in the cross-sectional MSS datasets and the polymicrobial sepsis dataset. Hexagonal plots confirmed VEGF-R (VEGF receptor)–VEGF-R2 signaling pathway enrichment in the MSS cross-sectional studies. The polymicrobial sepsis dataset also showed enrichment of the VEGF pathway in septic shock day 3 and sepsis day 3 samples compared with controls. These findings provide unique insights into the dynamic nature of sepsis from a transcriptomic perspective and suggest potential implications for biomarker development. Future research should focus on larger-scale temporal transcriptomic studies with appropriate control groups and validate the identified gene combination as a potential biomarker panel for sepsis

    Advancing sepsis clinical research: harnessing transcriptomics for an omics-based strategy - a comprehensive scoping review

    Get PDF
    Sepsis continues to be recognized as a significant global health challenge across all ages and is characterized by a complex pathophysiology. In this scoping review, PRISMA-ScR guidelines were adhered to, and a transcriptomic methodology was adopted, with the protocol registered on the Open Science Framework. We hypothesized that gene expression analysis could provide a foundation for establishing a clinical research framework for sepsis. A comprehensive search of the PubMed database was conducted with a particular focus on original research and systematic reviews of transcriptomic sepsis studies published between 2012 and 2022. Both coding and non-coding gene expression studies have been included in this review. An effort was made to enhance the understanding of sepsis at the mRNA gene expression level by applying a systems biology approach through transcriptomic analysis. Seven crucial components related to sepsis research were addressed in this study: endotyping (n = 64), biomarker (n = 409), definition (n = 0), diagnosis (n = 1098), progression (n = 124), severity (n = 451), and benchmark (n = 62). These components were classified into two groups, with one focusing on Biomarkers and Endotypes and the other oriented towards clinical aspects. Our review of the selected studies revealed a compelling association between gene transcripts and clinical sepsis, reinforcing the proposed research framework. Nevertheless, challenges have arisen from the lack of consensus in the sepsis terminology employed in research studies and the absence of a comprehensive definition of sepsis. There is a gap in the alignment between the notion of sepsis as a clinical phenomenon and that of laboratory indicators. It is potentially responsible for the variable number of patients within each category. Ideally, future studies should incorporate a transcriptomic perspective. The integration of transcriptomic data with clinical endpoints holds significant potential for advancing sepsis research, facilitating a consensus-driven approach, and enabling the precision management of sepsis

    High–temporal resolution profiling reveals distinct immune trajectories following the first and second doses of COVID-19 mRNA vaccines

    Get PDF
    Knowledge of the mechanisms underpinning the development of protective immunity conferred by mRNA vaccines is fragmentary. Here, we investigated responses to coronavirus disease 2019 (COVID-19) mRNA vaccination via high–temporal resolution blood transcriptome profiling. The first vaccine dose elicited modest interferon and adaptive immune responses, which peaked on days 2 and 5, respectively. The second vaccine dose, in contrast, elicited sharp day 1 interferon, inflammation, and erythroid cell responses, followed by a day 5 plasmablast response. Both post-first and post-second dose interferon signatures were associated with the subsequent development of antibody responses. Yet, we observed distinct interferon response patterns after each of the doses that may reflect quantitative or qualitative differences in interferon induction. Distinct interferon response phenotypes were also observed in patients with COVID-19 and were associated with severity and differences in duration of intensive care. Together, this study also highlights the benefits of adopting high-frequency sampling protocols in profiling vaccine-elicited immune responses
    corecore